Overview

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells443437
Missing cells (%)8.3%8.2%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Age has 91 (20.4%) missing values Age has 91 (20.4%) missing values Missing
Cabin has 351 (78.7%) missing values Cabin has 345 (77.4%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 297 (66.6%) zeros SibSp has 302 (67.7%) zeros Zeros
Parch has 345 (77.4%) zeros Parch has 338 (75.8%) zeros Zeros
Fare has 12 (2.7%) zeros Alert not present in this datasetZeros

Reproduction

 Dataset ADataset B
Analysis started2023-12-05 15:50:46.5153262023-12-05 15:50:50.896896
Analysis finished2023-12-05 15:50:50.8956332023-12-05 15:50:54.741189
Duration4.38 seconds3.84 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean432.40135448.72422
 Dataset ADataset B
Minimum16
Maximum889891
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-12-05T15:50:54.913851image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum16
5-th percentile37.2552.25
Q1203.75229.5
median432.5441.5
Q3666.75673.5
95-th percentile832.5849.75
Maximum889891
Range888885
Interquartile range (IQR)463444

Descriptive statistics

 Dataset ADataset B
Standard deviation259.54475257.57268
Coefficient of variation (CV)0.600240380.57401111
Kurtosis-1.243708-1.2069199
Mean432.40135448.72422
Median Absolute Deviation (MAD)232.5224.5
Skewness0.0104474040.024528039
Sum192851200131
Variance67363.47766343.688
MonotonicityNot monotonicNot monotonic
2023-12-05T15:50:55.187495image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
512 1
 
0.2%
245 1
 
0.2%
399 1
 
0.2%
753 1
 
0.2%
150 1
 
0.2%
547 1
 
0.2%
503 1
 
0.2%
475 1
 
0.2%
109 1
 
0.2%
517 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
499 1
 
0.2%
693 1
 
0.2%
758 1
 
0.2%
195 1
 
0.2%
72 1
 
0.2%
542 1
 
0.2%
306 1
 
0.2%
132 1
 
0.2%
715 1
 
0.2%
593 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
3 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
9 1
0.2%
11 1
0.2%
15 1
0.2%
17 1
0.2%
ValueCountFrequency (%)
6 1
0.2%
9 1
0.2%
10 1
0.2%
12 1
0.2%
13 1
0.2%
14 1
0.2%
16 1
0.2%
17 1
0.2%
20 1
0.2%
22 1
0.2%
ValueCountFrequency (%)
6 1
0.2%
9 1
0.2%
10 1
0.2%
12 1
0.2%
13 1
0.2%
14 1
0.2%
16 1
0.2%
17 1
0.2%
20 1
0.2%
22 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
3 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
9 1
0.2%
11 1
0.2%
15 1
0.2%
17 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
272 
1
174 
0
278 
1
168 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row00
2nd row00
3rd row00
4th row10
5th row10

Common Values

ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%

Length

2023-12-05T15:50:55.389733image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-12-05T15:50:55.535443image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:50:55.671352image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%

Most occurring characters

ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 446
100.0%
ValueCountFrequency (%)
Decimal Number 446
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%

Most occurring scripts

ValueCountFrequency (%)
Common 446
100.0%
ValueCountFrequency (%)
Common 446
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
263 
1
100 
2
83 
3
232 
1
112 
2
102 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row31
2nd row33
3rd row33
4th row33
5th row33

Common Values

ValueCountFrequency (%)
3 263
59.0%
1 100
 
22.4%
2 83
 
18.6%
ValueCountFrequency (%)
3 232
52.0%
1 112
25.1%
2 102
22.9%

Length

2023-12-05T15:50:55.819825image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-12-05T15:50:55.965931image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:50:56.115766image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
3 263
59.0%
1 100
 
22.4%
2 83
 
18.6%
ValueCountFrequency (%)
3 232
52.0%
1 112
25.1%
2 102
22.9%

Most occurring characters

ValueCountFrequency (%)
3 263
59.0%
1 100
 
22.4%
2 83
 
18.6%
ValueCountFrequency (%)
3 232
52.0%
1 112
25.1%
2 102
22.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 446
100.0%
ValueCountFrequency (%)
Decimal Number 446
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 263
59.0%
1 100
 
22.4%
2 83
 
18.6%
ValueCountFrequency (%)
3 232
52.0%
1 112
25.1%
2 102
22.9%

Most occurring scripts

ValueCountFrequency (%)
Common 446
100.0%
ValueCountFrequency (%)
Common 446
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 263
59.0%
1 100
 
22.4%
2 83
 
18.6%
ValueCountFrequency (%)
3 232
52.0%
1 112
25.1%
2 102
22.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 263
59.0%
1 100
 
22.4%
2 83
 
18.6%
ValueCountFrequency (%)
3 232
52.0%
1 112
25.1%
2 102
22.9%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-12-05T15:50:56.518422image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

 Dataset ADataset B
Max length8282
Median length5249
Mean length27.42600926.746637
Min length1212

Characters and Unicode

 Dataset ADataset B
Total characters1223211929
Distinct characters6059
Distinct categories77 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowWebber, Mr. JamesAllison, Mrs. Hudson J C (Bessie Waldo Daniels)
2nd rowHarknett, Miss. Alice PhoebeBalkic, Mr. Cerin
3rd rowAndersson, Miss. Sigrid ElisabethBoulos, Miss. Nourelain
4th rowDean, Master. Bertram VereSalonen, Mr. Johan Werner
5th rowDahl, Mr. Karl EdwartO'Connell, Mr. Patrick D
ValueCountFrequency (%)
mr 259
 
14.0%
miss 94
 
5.1%
mrs 69
 
3.7%
william 32
 
1.7%
john 25
 
1.4%
henry 18
 
1.0%
thomas 16
 
0.9%
master 15
 
0.8%
johan 10
 
0.5%
frederick 10
 
0.5%
Other values (892) 1296
70.3%
ValueCountFrequency (%)
mr 266
 
14.8%
miss 92
 
5.1%
mrs 60
 
3.3%
william 31
 
1.7%
john 22
 
1.2%
master 18
 
1.0%
henry 18
 
1.0%
charles 17
 
0.9%
james 15
 
0.8%
george 13
 
0.7%
Other values (870) 1246
69.3%
2023-12-05T15:50:57.278909image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1399
 
11.4%
r 986
 
8.1%
e 878
 
7.2%
a 853
 
7.0%
s 670
 
5.5%
n 652
 
5.3%
i 643
 
5.3%
M 570
 
4.7%
l 541
 
4.4%
o 508
 
4.2%
Other values (50) 4532
37.1%
ValueCountFrequency (%)
1353
 
11.3%
r 995
 
8.3%
e 885
 
7.4%
a 807
 
6.8%
n 663
 
5.6%
s 645
 
5.4%
i 641
 
5.4%
M 551
 
4.6%
l 517
 
4.3%
o 496
 
4.2%
Other values (49) 4376
36.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 7854
64.2%
Uppercase Letter 1858
 
15.2%
Space Separator 1399
 
11.4%
Other Punctuation 966
 
7.9%
Open Punctuation 73
 
0.6%
Close Punctuation 73
 
0.6%
Dash Punctuation 9
 
0.1%
ValueCountFrequency (%)
Lowercase Letter 7689
64.5%
Uppercase Letter 1811
 
15.2%
Space Separator 1353
 
11.3%
Other Punctuation 944
 
7.9%
Close Punctuation 63
 
0.5%
Open Punctuation 63
 
0.5%
Dash Punctuation 6
 
0.1%

Most frequent character per category

Space Separator
ValueCountFrequency (%)
1399
100.0%
ValueCountFrequency (%)
1353
100.0%
Lowercase Letter
ValueCountFrequency (%)
r 986
12.6%
e 878
11.2%
a 853
10.9%
s 670
8.5%
n 652
8.3%
i 643
8.2%
l 541
 
6.9%
o 508
 
6.5%
t 352
 
4.5%
h 273
 
3.5%
Other values (16) 1498
19.1%
ValueCountFrequency (%)
r 995
12.9%
e 885
11.5%
a 807
10.5%
n 663
8.6%
s 645
8.4%
i 641
8.3%
l 517
 
6.7%
o 496
 
6.5%
t 343
 
4.5%
h 274
 
3.6%
Other values (16) 1423
18.5%
Uppercase Letter
ValueCountFrequency (%)
M 570
30.7%
A 137
 
7.4%
J 117
 
6.3%
H 95
 
5.1%
S 91
 
4.9%
E 86
 
4.6%
B 81
 
4.4%
C 80
 
4.3%
W 67
 
3.6%
P 61
 
3.3%
Other values (15) 473
25.5%
ValueCountFrequency (%)
M 551
30.4%
A 116
 
6.4%
H 110
 
6.1%
J 97
 
5.4%
C 92
 
5.1%
E 90
 
5.0%
S 83
 
4.6%
R 71
 
3.9%
B 69
 
3.8%
L 68
 
3.8%
Other values (14) 464
25.6%
Other Punctuation
ValueCountFrequency (%)
. 447
46.3%
, 446
46.2%
" 64
 
6.6%
' 8
 
0.8%
/ 1
 
0.1%
ValueCountFrequency (%)
. 446
47.2%
, 446
47.2%
" 46
 
4.9%
' 5
 
0.5%
/ 1
 
0.1%
Open Punctuation
ValueCountFrequency (%)
( 73
100.0%
ValueCountFrequency (%)
( 63
100.0%
Close Punctuation
ValueCountFrequency (%)
) 73
100.0%
ValueCountFrequency (%)
) 63
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 9
100.0%
ValueCountFrequency (%)
- 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 9712
79.4%
Common 2520
 
20.6%
ValueCountFrequency (%)
Latin 9500
79.6%
Common 2429
 
20.4%

Most frequent character per script

Common
ValueCountFrequency (%)
1399
55.5%
. 447
 
17.7%
, 446
 
17.7%
( 73
 
2.9%
) 73
 
2.9%
" 64
 
2.5%
- 9
 
0.4%
' 8
 
0.3%
/ 1
 
< 0.1%
ValueCountFrequency (%)
1353
55.7%
. 446
 
18.4%
, 446
 
18.4%
) 63
 
2.6%
( 63
 
2.6%
" 46
 
1.9%
- 6
 
0.2%
' 5
 
0.2%
/ 1
 
< 0.1%
Latin
ValueCountFrequency (%)
r 986
 
10.2%
e 878
 
9.0%
a 853
 
8.8%
s 670
 
6.9%
n 652
 
6.7%
i 643
 
6.6%
M 570
 
5.9%
l 541
 
5.6%
o 508
 
5.2%
t 352
 
3.6%
Other values (41) 3059
31.5%
ValueCountFrequency (%)
r 995
 
10.5%
e 885
 
9.3%
a 807
 
8.5%
n 663
 
7.0%
s 645
 
6.8%
i 641
 
6.7%
M 551
 
5.8%
l 517
 
5.4%
o 496
 
5.2%
t 343
 
3.6%
Other values (40) 2957
31.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12232
100.0%
ValueCountFrequency (%)
ASCII 11929
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1399
 
11.4%
r 986
 
8.1%
e 878
 
7.2%
a 853
 
7.0%
s 670
 
5.5%
n 652
 
5.3%
i 643
 
5.3%
M 570
 
4.7%
l 541
 
4.4%
o 508
 
4.2%
Other values (50) 4532
37.1%
ValueCountFrequency (%)
1353
 
11.3%
r 995
 
8.3%
e 885
 
7.4%
a 807
 
6.8%
n 663
 
5.6%
s 645
 
5.4%
i 641
 
5.4%
M 551
 
4.6%
l 517
 
4.3%
o 496
 
4.2%
Other values (49) 4376
36.7%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
284 
female
162 
male
293 
female
153 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.72645744.6860987
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters21082090
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowmalefemale
2nd rowfemalemale
3rd rowfemalefemale
4th rowmalemale
5th rowmalemale

Common Values

ValueCountFrequency (%)
male 284
63.7%
female 162
36.3%
ValueCountFrequency (%)
male 293
65.7%
female 153
34.3%

Length

2023-12-05T15:50:57.527925image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-12-05T15:50:57.690744image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:50:57.973849image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
male 284
63.7%
female 162
36.3%
ValueCountFrequency (%)
male 293
65.7%
female 153
34.3%

Most occurring characters

ValueCountFrequency (%)
e 608
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 162
 
7.7%
ValueCountFrequency (%)
e 599
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 153
 
7.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2108
100.0%
ValueCountFrequency (%)
Lowercase Letter 2090
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 608
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 162
 
7.7%
ValueCountFrequency (%)
e 599
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 153
 
7.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 2108
100.0%
ValueCountFrequency (%)
Latin 2090
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 608
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 162
 
7.7%
ValueCountFrequency (%)
e 599
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 153
 
7.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2108
100.0%
ValueCountFrequency (%)
ASCII 2090
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 608
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 162
 
7.7%
ValueCountFrequency (%)
e 599
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 153
 
7.3%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7276
Distinct (%)20.3%21.4%
Missing9191
Missing (%)20.4%20.4%
Infinite00
Infinite (%)0.0%0.0%
Mean29.28498629.341324
 Dataset ADataset B
Minimum0.420.42
Maximum7471
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-12-05T15:50:58.187830image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.420.42
5-th percentile4.73.7
Q119.520
median2828
Q337.539
95-th percentile5454
Maximum7471
Range73.5870.58
Interquartile range (IQR)1819

Descriptive statistics

 Dataset ADataset B
Standard deviation14.1659914.453535
Coefficient of variation (CV)0.483728760.49259997
Kurtosis0.15303096-0.24400782
Mean29.28498629.341324
Median Absolute Deviation (MAD)99
Skewness0.403957060.21125129
Sum10396.1710416.17
Variance200.67527208.90468
MonotonicityNot monotonicNot monotonic
2023-12-05T15:50:58.464664image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
19 16
 
3.6%
24 15
 
3.4%
36 14
 
3.1%
28 14
 
3.1%
16 12
 
2.7%
22 12
 
2.7%
18 12
 
2.7%
25 12
 
2.7%
26 12
 
2.7%
32 11
 
2.5%
Other values (62) 225
50.4%
(Missing) 91
20.4%
ValueCountFrequency (%)
25 13
 
2.9%
21 13
 
2.9%
24 13
 
2.9%
18 12
 
2.7%
30 12
 
2.7%
22 12
 
2.7%
19 12
 
2.7%
20 12
 
2.7%
16 11
 
2.5%
31 10
 
2.2%
Other values (66) 235
52.7%
(Missing) 91
 
20.4%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 1
 
0.2%
1 3
0.7%
2 6
1.3%
3 4
0.9%
4 3
0.7%
5 3
0.7%
6 1
 
0.2%
7 1
 
0.2%
8 3
0.7%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.67 1
 
0.2%
0.75 2
 
0.4%
0.83 2
 
0.4%
0.92 1
 
0.2%
1 2
 
0.4%
2 6
1.3%
3 3
0.7%
4 3
0.7%
5 3
0.7%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.67 1
 
0.2%
0.75 2
 
0.4%
0.83 2
 
0.4%
0.92 1
 
0.2%
1 2
 
0.4%
2 6
1.3%
3 3
0.7%
4 3
0.7%
5 3
0.7%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 1
 
0.2%
1 3
0.7%
2 6
1.3%
3 4
0.9%
4 3
0.7%
5 3
0.7%
6 1
 
0.2%
7 1
 
0.2%
8 3
0.7%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.54260090.56278027
 Dataset ADataset B
Minimum00
Maximum88
Zeros297302
Zeros (%)66.6%67.7%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-12-05T15:50:58.672097image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile22.75
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.12248781.2102021
Coefficient of variation (CV)2.06871722.1503989
Kurtosis18.53587116.771978
Mean0.54260090.56278027
Median Absolute Deviation (MAD)00
Skewness3.76677463.6922183
Sum242251
Variance1.25997881.4645891
MonotonicityNot monotonicNot monotonic
2023-12-05T15:50:58.838178image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 297
66.6%
1 111
 
24.9%
2 17
 
3.8%
4 8
 
1.8%
3 6
 
1.3%
8 4
 
0.9%
5 3
 
0.7%
ValueCountFrequency (%)
0 302
67.7%
1 103
 
23.1%
2 18
 
4.0%
4 8
 
1.8%
3 5
 
1.1%
5 5
 
1.1%
8 5
 
1.1%
ValueCountFrequency (%)
0 297
66.6%
1 111
 
24.9%
2 17
 
3.8%
3 6
 
1.3%
4 8
 
1.8%
5 3
 
0.7%
8 4
 
0.9%
ValueCountFrequency (%)
0 302
67.7%
1 103
 
23.1%
2 18
 
4.0%
3 5
 
1.1%
4 8
 
1.8%
5 5
 
1.1%
8 5
 
1.1%
ValueCountFrequency (%)
0 302
67.7%
1 103
 
23.1%
2 18
 
4.0%
3 5
 
1.1%
4 8
 
1.8%
5 5
 
1.1%
8 5
 
1.1%
ValueCountFrequency (%)
0 297
66.6%
1 111
 
24.9%
2 17
 
3.8%
3 6
 
1.3%
4 8
 
1.8%
5 3
 
0.7%
8 4
 
0.9%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.3677130.39013453
 Dataset ADataset B
Minimum00
Maximum66
Zeros345338
Zeros (%)77.4%75.8%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-12-05T15:50:58.995549image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q300
95-th percentile22
Maximum66
Range66
Interquartile range (IQR)00

Descriptive statistics

 Dataset ADataset B
Standard deviation0.826111870.82939885
Coefficient of variation (CV)2.24662132.1259304
Kurtosis12.31963111.125881
Mean0.3677130.39013453
Median Absolute Deviation (MAD)00
Skewness3.09418632.8930729
Sum164174
Variance0.682460830.68790245
MonotonicityNot monotonicNot monotonic
2023-12-05T15:50:59.155417image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 345
77.4%
1 58
 
13.0%
2 34
 
7.6%
5 3
 
0.7%
3 3
 
0.7%
4 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 338
75.8%
1 60
 
13.5%
2 40
 
9.0%
3 3
 
0.7%
5 3
 
0.7%
6 1
 
0.2%
4 1
 
0.2%
ValueCountFrequency (%)
0 345
77.4%
1 58
 
13.0%
2 34
 
7.6%
3 3
 
0.7%
4 2
 
0.4%
5 3
 
0.7%
6 1
 
0.2%
ValueCountFrequency (%)
0 338
75.8%
1 60
 
13.5%
2 40
 
9.0%
3 3
 
0.7%
4 1
 
0.2%
5 3
 
0.7%
6 1
 
0.2%
ValueCountFrequency (%)
0 338
75.8%
1 60
 
13.5%
2 40
 
9.0%
3 3
 
0.7%
4 1
 
0.2%
5 3
 
0.7%
6 1
 
0.2%
ValueCountFrequency (%)
0 345
77.4%
1 58
 
13.0%
2 34
 
7.6%
3 3
 
0.7%
4 2
 
0.4%
5 3
 
0.7%
6 1
 
0.2%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct378380
Distinct (%)84.8%85.2%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-12-05T15:50:59.711826image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.56502246.6547085
Min length33

Characters and Unicode

 Dataset ADataset B
Total characters29282968
Distinct characters3432
Distinct categories55 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique326339 ?
Unique (%)73.1%76.0%

Sample

 Dataset ADataset B
1st rowSOTON/OQ 3101316113781
2nd rowW./C. 6609349248
3rd row3470822678
4th rowC.A. 23153101296
5th row7598334912
ValueCountFrequency (%)
pc 24
 
4.3%
c.a 15
 
2.7%
ca 9
 
1.6%
ston/o 7
 
1.3%
2 7
 
1.3%
a/5 7
 
1.3%
2144 4
 
0.7%
a/4 4
 
0.7%
w./c 4
 
0.7%
line 4
 
0.7%
Other values (395) 468
84.6%
ValueCountFrequency (%)
pc 32
 
5.7%
a/5 12
 
2.1%
ca 11
 
1.9%
c.a 9
 
1.6%
w./c 7
 
1.2%
2144 6
 
1.1%
382652 5
 
0.9%
sc/paris 5
 
0.9%
2343 5
 
0.9%
347082 5
 
0.9%
Other values (401) 468
82.8%
2023-12-05T15:51:00.537789image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 381
13.0%
1 337
11.5%
2 297
10.1%
4 248
8.5%
7 239
8.2%
6 206
 
7.0%
0 201
 
6.9%
5 191
 
6.5%
9 165
 
5.6%
8 138
 
4.7%
Other values (24) 525
17.9%
ValueCountFrequency (%)
3 355
12.0%
1 344
11.6%
2 304
10.2%
4 251
8.5%
7 249
8.4%
6 210
 
7.1%
0 191
 
6.4%
5 184
 
6.2%
9 161
 
5.4%
8 145
 
4.9%
Other values (22) 574
19.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2403
82.1%
Uppercase Letter 278
 
9.5%
Other Punctuation 128
 
4.4%
Space Separator 107
 
3.7%
Lowercase Letter 12
 
0.4%
ValueCountFrequency (%)
Decimal Number 2394
80.7%
Uppercase Letter 300
 
10.1%
Other Punctuation 146
 
4.9%
Space Separator 119
 
4.0%
Lowercase Letter 9
 
0.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 381
15.9%
1 337
14.0%
2 297
12.4%
4 248
10.3%
7 239
9.9%
6 206
8.6%
0 201
8.4%
5 191
7.9%
9 165
6.9%
8 138
 
5.7%
ValueCountFrequency (%)
3 355
14.8%
1 344
14.4%
2 304
12.7%
4 251
10.5%
7 249
10.4%
6 210
8.8%
0 191
8.0%
5 184
7.7%
9 161
6.7%
8 145
6.1%
Space Separator
ValueCountFrequency (%)
107
100.0%
ValueCountFrequency (%)
119
100.0%
Other Punctuation
ValueCountFrequency (%)
. 87
68.0%
/ 41
32.0%
ValueCountFrequency (%)
. 98
67.1%
/ 48
32.9%
Uppercase Letter
ValueCountFrequency (%)
C 64
23.0%
O 43
15.5%
A 40
14.4%
P 40
14.4%
S 28
10.1%
N 20
 
7.2%
T 16
 
5.8%
W 6
 
2.2%
Q 5
 
1.8%
E 4
 
1.4%
Other values (5) 12
 
4.3%
ValueCountFrequency (%)
C 84
28.0%
P 46
15.3%
A 44
14.7%
O 36
12.0%
S 32
 
10.7%
N 12
 
4.0%
T 11
 
3.7%
W 11
 
3.7%
F 6
 
2.0%
Q 5
 
1.7%
Other values (5) 13
 
4.3%
Lowercase Letter
ValueCountFrequency (%)
a 3
25.0%
s 3
25.0%
r 2
16.7%
i 2
16.7%
l 1
 
8.3%
e 1
 
8.3%
ValueCountFrequency (%)
a 3
33.3%
i 2
22.2%
s 2
22.2%
r 2
22.2%

Most occurring scripts

ValueCountFrequency (%)
Common 2638
90.1%
Latin 290
 
9.9%
ValueCountFrequency (%)
Common 2659
89.6%
Latin 309
 
10.4%

Most frequent character per script

Common
ValueCountFrequency (%)
3 381
14.4%
1 337
12.8%
2 297
11.3%
4 248
9.4%
7 239
9.1%
6 206
7.8%
0 201
7.6%
5 191
7.2%
9 165
6.3%
8 138
 
5.2%
Other values (3) 235
8.9%
ValueCountFrequency (%)
3 355
13.4%
1 344
12.9%
2 304
11.4%
4 251
9.4%
7 249
9.4%
6 210
7.9%
0 191
7.2%
5 184
6.9%
9 161
6.1%
8 145
5.5%
Other values (3) 265
10.0%
Latin
ValueCountFrequency (%)
C 64
22.1%
O 43
14.8%
A 40
13.8%
P 40
13.8%
S 28
9.7%
N 20
 
6.9%
T 16
 
5.5%
W 6
 
2.1%
Q 5
 
1.7%
E 4
 
1.4%
Other values (11) 24
 
8.3%
ValueCountFrequency (%)
C 84
27.2%
P 46
14.9%
A 44
14.2%
O 36
11.7%
S 32
 
10.4%
N 12
 
3.9%
T 11
 
3.6%
W 11
 
3.6%
F 6
 
1.9%
Q 5
 
1.6%
Other values (9) 22
 
7.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2928
100.0%
ValueCountFrequency (%)
ASCII 2968
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 381
13.0%
1 337
11.5%
2 297
10.1%
4 248
8.5%
7 239
8.2%
6 206
 
7.0%
0 201
 
6.9%
5 191
 
6.5%
9 165
 
5.6%
8 138
 
4.7%
Other values (24) 525
17.9%
ValueCountFrequency (%)
3 355
12.0%
1 344
11.6%
2 304
10.2%
4 251
8.5%
7 249
8.4%
6 210
 
7.1%
0 191
 
6.4%
5 184
 
6.2%
9 161
 
5.4%
8 145
 
4.9%
Other values (22) 574
19.3%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct170184
Distinct (%)38.1%41.3%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean28.68374433.485911
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros123
Zeros (%)2.7%0.7%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-12-05T15:51:00.824340image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.051057.2292
Q17.89587.9031
median1315.0479
Q329.12531.359375
95-th percentile90113.275
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)21.229223.456275

Descriptive statistics

 Dataset ADataset B
Standard deviation42.66903250.731497
Coefficient of variation (CV)1.48756841.5150102
Kurtosis43.4926937.783069
Mean28.68374433.485911
Median Absolute Deviation (MAD)5.77087.76665
Skewness5.25529115.069095
Sum12792.9514934.716
Variance1820.64632573.6848
MonotonicityNot monotonicNot monotonic
2023-12-05T15:51:01.102390image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7.8958 23
 
5.2%
8.05 20
 
4.5%
13 19
 
4.3%
7.75 17
 
3.8%
26 14
 
3.1%
10.5 12
 
2.7%
0 12
 
2.7%
7.925 11
 
2.5%
7.225 9
 
2.0%
7.2292 9
 
2.0%
Other values (160) 300
67.3%
ValueCountFrequency (%)
7.8958 23
 
5.2%
13 23
 
5.2%
8.05 21
 
4.7%
7.75 20
 
4.5%
26 15
 
3.4%
10.5 14
 
3.1%
7.225 8
 
1.8%
7.25 8
 
1.8%
7.8542 8
 
1.8%
7.775 7
 
1.6%
Other values (174) 299
67.0%
ValueCountFrequency (%)
0 12
2.7%
4.0125 1
 
0.2%
6.2375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 2
 
0.4%
6.975 1
 
0.2%
7.0458 1
 
0.2%
7.05 3
 
0.7%
7.0542 2
 
0.4%
ValueCountFrequency (%)
0 3
 
0.7%
4.0125 1
 
0.2%
6.2375 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
7.05 3
 
0.7%
7.0542 1
 
0.2%
7.125 2
 
0.4%
7.225 8
1.8%
7.2292 6
1.3%
ValueCountFrequency (%)
0 3
 
0.7%
4.0125 1
 
0.2%
6.2375 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
7.05 3
 
0.7%
7.0542 1
 
0.2%
7.125 2
 
0.4%
7.225 8
1.8%
7.2292 6
1.3%
ValueCountFrequency (%)
0 12
2.7%
4.0125 1
 
0.2%
6.2375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 2
 
0.4%
6.975 1
 
0.2%
7.0458 1
 
0.2%
7.05 3
 
0.7%
7.0542 2
 
0.4%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct7885
Distinct (%)82.1%84.2%
Missing351345
Missing (%)78.7%77.4%
Memory size7.0 KiB7.0 KiB
2023-12-05T15:51:01.588581image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1515
Median length33
Mean length3.34736843.5445545
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters318358
Distinct characters1818
Distinct categories33 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique6472 ?
Unique (%)67.4%71.3%

Sample

 Dataset ADataset B
1st rowE63C22 C26
2nd rowB20B78
3rd rowC104E58
4th rowE44C126
5th rowB51 B53 B55C148
ValueCountFrequency (%)
g6 4
 
3.7%
f33 3
 
2.8%
e24 2
 
1.9%
e8 2
 
1.9%
c65 2
 
1.9%
c123 2
 
1.9%
c2 2
 
1.9%
c68 2
 
1.9%
d 2
 
1.9%
d20 2
 
1.9%
Other values (80) 84
78.5%
ValueCountFrequency (%)
c22 3
 
2.6%
f33 3
 
2.6%
c26 3
 
2.6%
f 3
 
2.6%
e101 3
 
2.6%
e8 2
 
1.7%
d36 2
 
1.7%
c52 2
 
1.7%
c68 2
 
1.7%
g73 2
 
1.7%
Other values (84) 90
78.3%
2023-12-05T15:51:02.285060image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 33
10.4%
3 29
 
9.1%
6 29
 
9.1%
2 27
 
8.5%
B 25
 
7.9%
D 22
 
6.9%
5 21
 
6.6%
1 20
 
6.3%
8 17
 
5.3%
0 16
 
5.0%
Other values (8) 79
24.8%
ValueCountFrequency (%)
C 40
11.2%
2 35
 
9.8%
1 32
 
8.9%
3 31
 
8.7%
6 28
 
7.8%
B 25
 
7.0%
8 23
 
6.4%
E 19
 
5.3%
4 18
 
5.0%
7 17
 
4.7%
Other values (8) 90
25.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 199
62.6%
Uppercase Letter 107
33.6%
Space Separator 12
 
3.8%
ValueCountFrequency (%)
Decimal Number 229
64.0%
Uppercase Letter 115
32.1%
Space Separator 14
 
3.9%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C 33
30.8%
B 25
23.4%
D 22
20.6%
E 14
13.1%
F 7
 
6.5%
G 4
 
3.7%
A 2
 
1.9%
ValueCountFrequency (%)
C 40
34.8%
B 25
21.7%
E 19
16.5%
D 17
14.8%
F 7
 
6.1%
G 4
 
3.5%
A 3
 
2.6%
Decimal Number
ValueCountFrequency (%)
3 29
14.6%
6 29
14.6%
2 27
13.6%
5 21
10.6%
1 20
10.1%
8 17
8.5%
0 16
8.0%
7 15
7.5%
4 13
6.5%
9 12
6.0%
ValueCountFrequency (%)
2 35
15.3%
1 32
14.0%
3 31
13.5%
6 28
12.2%
8 23
10.0%
4 18
7.9%
7 17
7.4%
5 17
7.4%
0 14
 
6.1%
9 14
 
6.1%
Space Separator
ValueCountFrequency (%)
12
100.0%
ValueCountFrequency (%)
14
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 211
66.4%
Latin 107
33.6%
ValueCountFrequency (%)
Common 243
67.9%
Latin 115
32.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 33
30.8%
B 25
23.4%
D 22
20.6%
E 14
13.1%
F 7
 
6.5%
G 4
 
3.7%
A 2
 
1.9%
ValueCountFrequency (%)
C 40
34.8%
B 25
21.7%
E 19
16.5%
D 17
14.8%
F 7
 
6.1%
G 4
 
3.5%
A 3
 
2.6%
Common
ValueCountFrequency (%)
3 29
13.7%
6 29
13.7%
2 27
12.8%
5 21
10.0%
1 20
9.5%
8 17
8.1%
0 16
7.6%
7 15
7.1%
4 13
6.2%
9 12
5.7%
ValueCountFrequency (%)
2 35
14.4%
1 32
13.2%
3 31
12.8%
6 28
11.5%
8 23
9.5%
4 18
7.4%
7 17
7.0%
5 17
7.0%
0 14
 
5.8%
14
 
5.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 318
100.0%
ValueCountFrequency (%)
ASCII 358
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 33
10.4%
3 29
 
9.1%
6 29
 
9.1%
2 27
 
8.5%
B 25
 
7.9%
D 22
 
6.9%
5 21
 
6.6%
1 20
 
6.3%
8 17
 
5.3%
0 16
 
5.0%
Other values (8) 79
24.8%
ValueCountFrequency (%)
C 40
11.2%
2 35
 
9.8%
1 32
 
8.9%
3 31
 
8.7%
6 28
 
7.8%
B 25
 
7.0%
8 23
 
6.4%
E 19
 
5.3%
4 18
 
5.0%
7 17
 
4.7%
Other values (8) 90
25.1%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing11
Missing (%)0.2%0.2%
Memory size7.0 KiB7.0 KiB
S
327 
C
79 
Q
39 
S
319 
C
81 
Q
45 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters445445
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSS
2nd rowSS
3rd rowSC
4th rowSS
5th rowSQ

Common Values

ValueCountFrequency (%)
S 327
73.3%
C 79
 
17.7%
Q 39
 
8.7%
(Missing) 1
 
0.2%
ValueCountFrequency (%)
S 319
71.5%
C 81
 
18.2%
Q 45
 
10.1%
(Missing) 1
 
0.2%

Length

2023-12-05T15:51:02.510901image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-12-05T15:51:02.657446image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:51:02.807484image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
s 327
73.5%
c 79
 
17.8%
q 39
 
8.8%
ValueCountFrequency (%)
s 319
71.7%
c 81
 
18.2%
q 45
 
10.1%

Most occurring characters

ValueCountFrequency (%)
S 327
73.5%
C 79
 
17.8%
Q 39
 
8.8%
ValueCountFrequency (%)
S 319
71.7%
C 81
 
18.2%
Q 45
 
10.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 445
100.0%
ValueCountFrequency (%)
Uppercase Letter 445
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 327
73.5%
C 79
 
17.8%
Q 39
 
8.8%
ValueCountFrequency (%)
S 319
71.7%
C 81
 
18.2%
Q 45
 
10.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 445
100.0%
ValueCountFrequency (%)
Latin 445
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 327
73.5%
C 79
 
17.8%
Q 39
 
8.8%
ValueCountFrequency (%)
S 319
71.7%
C 81
 
18.2%
Q 45
 
10.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 445
100.0%
ValueCountFrequency (%)
ASCII 445
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 327
73.5%
C 79
 
17.8%
Q 39
 
8.8%
ValueCountFrequency (%)
S 319
71.7%
C 81
 
18.2%
Q 45
 
10.1%

Interactions

Dataset A

2023-12-05T15:50:49.732286image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:50:53.617595image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-05T15:50:47.060217image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:50:51.037931image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-05T15:50:47.787468image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:50:51.649481image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-05T15:50:48.428586image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:50:52.289474image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-05T15:50:49.090010image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:50:52.981669image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-05T15:50:49.847416image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:50:53.733889image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-05T15:50:47.178189image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:50:51.139959image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-05T15:50:47.910202image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:50:51.770727image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-05T15:50:48.555384image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:50:52.384063image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-05T15:50:49.211321image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:50:53.102341image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-05T15:50:49.978102image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:50:53.862644image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-05T15:50:47.309855image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:50:51.269029image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-05T15:50:48.045523image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:50:51.908644image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-05T15:50:48.683888image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:50:52.478791image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-05T15:50:49.342713image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:50:53.232974image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-05T15:50:50.113426image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:50:53.997679image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-05T15:50:47.448844image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:50:51.404353image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-05T15:50:48.171210image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:50:52.034644image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-05T15:50:48.827034image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:50:52.721882image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-05T15:50:49.481624image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:50:53.369781image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-05T15:50:50.241819image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:50:54.126995image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-05T15:50:47.664382image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:50:51.528975image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-05T15:50:48.304663image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:50:52.165316image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-05T15:50:48.962159image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:50:52.853316image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset A

2023-12-05T15:50:49.606764image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Dataset B

2023-12-05T15:50:53.496365image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Missing values

Dataset A

2023-12-05T15:50:50.422836image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2023-12-05T15:50:54.307268image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2023-12-05T15:50:50.685972image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2023-12-05T15:50:54.565985image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
51151203Webber, Mr. JamesmaleNaN00SOTON/OQ 31013168.0500NaNS
23523603Harknett, Miss. Alice PhoebefemaleNaN00W./C. 66097.5500NaNS
54254303Andersson, Miss. Sigrid Elisabethfemale11.04234708231.2750NaNS
78878913Dean, Master. Bertram Veremale1.012C.A. 231520.5750NaNS
33833913Dahl, Mr. Karl Edwartmale45.00075988.0500NaNS
11211303Barton, Mr. David Johnmale22.0003246698.0500NaNS
46246301Gee, Mr. Arthur Hmale47.00011132038.5000E63S
15916003Sage, Master. Thomas HenrymaleNaN82CA. 234369.5500NaNS
84684703Sage, Mr. Douglas BullenmaleNaN82CA. 234369.5500NaNS
49349401Artagaveytia, Mr. Ramonmale71.000PC 1760949.5042NaNC

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
49849901Allison, Mrs. Hudson J C (Bessie Waldo Daniels)female25.012113781151.5500C22 C26S
87087103Balkic, Mr. Cerinmale26.0003492487.8958NaNS
85285303Boulos, Miss. Nourelainfemale9.011267815.2458NaNC
52852903Salonen, Mr. Johan Wernermale39.00031012967.9250NaNS
62963003O'Connell, Mr. Patrick DmaleNaN003349127.7333NaNQ
313211Spencer, Mrs. William Augustus (Marie Eugenie)femaleNaN10PC 17569146.5208B78C
66266301Colley, Mr. Edward Pomeroymale47.000572725.5875E58S
82882913McCormack, Mr. Thomas JosephmaleNaN003672287.7500NaNQ
61461503Brocklebank, Mr. William Alfredmale35.0003645128.0500NaNS
52652712Ridsdale, Miss. Lucyfemale50.000W./C. 1425810.5000NaNS

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
66766803Rommetvedt, Mr. Knud PaustmaleNaN003129937.7750NaNS
30630711Fleming, Miss. MargaretfemaleNaN0017421110.8833NaNC
17617703Lefebre, Master. Henry ForbesmaleNaN31413325.4667NaNS
60360403Torber, Mr. Ernst Williammale44.0003645118.0500NaNS
646501Stewart, Mr. Albert AmaleNaN00PC 1760527.7208NaNC
13914001Giglio, Mr. Victormale24.000PC 1759379.2000B86C
53253303Elias, Mr. Joseph Jrmale17.01126907.2292NaNC
31431502Hart, Mr. Benjaminmale43.011F.C.C. 1352926.2500NaNS
34434502Fox, Mr. Stanley Hubertmale36.00022923613.0000NaNS
34034112Navratil, Master. Edmond Rogermale2.01123008026.0000F2S

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
555611Woolner, Mr. HughmaleNaN001994735.5000C52S
27227312Mellinger, Mrs. (Elizabeth Anne Maidment)female41.00125064419.5000NaNS
81781802Mallet, Mr. Albertmale31.011S.C./PARIS 207937.0042NaNC
49049103Hagland, Mr. Konrad Mathias ReiersenmaleNaN106530419.9667NaNS
151612Hewlett, Mrs. (Mary D Kingcome)female55.00024870616.0000NaNS
10911013Moran, Miss. BerthafemaleNaN1037111024.1500NaNQ
13813903Osen, Mr. Olaf Elonmale16.00075349.2167NaNS
56957013Jonsson, Mr. Carlmale32.0003504177.8542NaNS
67767813Turja, Miss. Anna Sofiafemale18.00041389.8417NaNS
75375403Jonkoff, Mr. Laliomale23.0003492047.8958NaNS

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.